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Summary. 

Unlike the Probability Theory based on additivity, Statistical Inference seems to hesitate be- 
tween "Additivity" and a so-called "Maxitivity" approach. After a brief overview of three types of 
principles for any (parametric) statistical theory and the proof that these principles are mutually 
exclusive, the paper shows that two kinds of support measures are conceivable, an additive 
one and a maxitive one (based on maximization operators). Unfortunately, none of them is 
able to cope with the ignorance part of the statistical experiment and, in the meantime, with 
the partial information given through the structure of the data. To conclude, the author pro- 
motes the combined use of both approaches, as an efficient middle-of-the-road position for the 
statistician. 
Resume. 

Contrairement a la theorie de probability qui est fondee sur I'additivite, I'inference statistique 
semble hesiter entre "I'Additivite" et ce que d'aucun appelle la "Maxitivite". Apres un bref sur- 
vol des trois categories de principes applicables a toute theorie (parametrique) statistique et 
la demonstration que ceux-ci sent mutuellement exclusifs, le papier montre que deux types 
de mesure de support sont envisageables, a savoir une mesure additive ou une mesure max- 
itive (basee sur des operateurs de maximisation). Malheureusement, aucune n'est capable 
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d'apprehender correctement la part d'ignorance contenue dans I'experience statistique et, 
dans le meme temps, I'information partielle delivree par la structure des donnees. En conclu- 
sion, I'auteur propose rutilisation combinee des deux approches comme une position efficace 
et mediane pour le statisticien. 

Keywords: Principles of statistics, Support measures, Maxitivity property. Statistical paradoxes 



1. Introduction 

In this paper, we restrict ourselves to parametric statistical models. Doing that, we know 
that we leave aside a large part of statistics. But we think that equivalent reflections can 
be done for non parametric statistics. Each statistical theory tries to answer the same 
basic question : "What can we say about the underlying hypotheses, from the observed data 
information we get ?" The different schools of inference have succeeded in giving an answer 
to the question, as long as one accepts some principles related to these schools. Classical 
approach is best if one looks for long-run properties. Bayesian inference should be chosen if 
one has meaningful proper prior information over the parameter space. Structural inference 
is to be used for transformation models, etc. But there is not always evident prior to choose 
and the Bayesian approach is therefore difficult to apply; or the data come from a unique 
and non replicable experiment and the Classical approach is no longer appropriate. 

It would be naive to believe that a single infer ence theory could be suitable for all inference 



problems. In that sense, we totally agree with 



Kalbfieisch and Sprott (19 701) : "In fact, the 



main criticism to be directed at the study of statistical inference today is the slavish adherence 
to rigid dogmas and principles (e.g. Bayes theorem, likelihood principle, admissibility, etc.) 
which is characteristic of the various schools of inference ■ ■ ■ To claim that all problems of 
inference have been, or even can be, solved by one overriding principle seems to us naive. " 
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This appears to close definitively the search for a general statistical inference school. What 

we try to do in this paper, is to go deeper in the formal understanding of such a failure. 
We do that by focusing on the opposition between the way to handle ignorance on one side 
and structural data information on the other side. For that, it is important to look at the 
principles underlying the various schools of inference. 



2. Three sets of principles : a brief overview of Statistical principles 

2.1. Notations 

We represent a parametric statistical experiment by means of the following model : 

~M{X,<d,p0{x), ^{x)) where X is the sample space, 

O the parameter space, 

Pe{x) the density family with respect to fJ,{x), and 
fj.{x) SL cr-finite measure over X 
(usually the Lebesgue or the countable measure) 
—Plus the knowledge of the observed data "x £ i?" . 

Many principles will not be mentioned here because they are mere consequences of general 
principles such as the Likelihoo d or Invariance o nes. We think, for instance, of the Math- 



ematical Equivalence principle (jBirnbaum (1964() ). which states that our inference should 



be independent from any one-to-one transformation of the sample space. This principle is 
a corollary of the Likelihood principle. 
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2.2. First set : invariance concerning tlie parameter space 

2.2.1. The (Strong) Invariance principle I 

I : Let Mi{X,Q,pg{x), fj,{x)) and M2{X' , O' , p'^, (x ) , ^' (x )) be two different models for 
the same experiment, connected by two functions f : X X and g : Q ^ Q such that 
pe{{x : f{x)^x'^}) = lPg(e)i^o) and n{{x : f{x) ^ x'j) = fi' {x'^) V0 E Q and Va;^ £ x' . 
The Invariance principle states that equivalent inference about g{9) should be made from 
the first model given the knowledge "/(a;) — x^" as from the second model given the same 
observation "a; — x^" . 

To better understand the Invariance principle, let us consider N{^,a'^), the Normal model. 
Suppose we only observe the value of the standard deviation = s^. The Invariance 
principle states that inference about a given the observed should be the same whether 
one uses the Normal model N{fj,,a'^) or the Chi-square distribution for The Invariance 
principle requires that statistical inference does not depend on the choice of parameterization 
for the model. A consequence of this principle is the invariance of inference under one-to- 
one transformation of the parameter. This principle is advocated by m any authors, see for 



instance 



Hartigan (1967t) . It is at the heart of the paradoxes studied bv iPawid et al. (19731 ) 



against Bayesian and Structural inference. See also the old Bertrand-Von Mises paradox 

about the choice of the ratio "wine-water" versus "water-wine" as our parameterization 

I ' 

within an uniform model (jVon Mises (19391) ). 

2.3. Second set : invariance concerning tlie sample space 

2.3.1. The Censoring principle C£ 

C£ : For any specified outcome Xo of an experiment A^(X, 0,pe(a:), our statistical 
evidence is fully characterized by the function pg{xo), 9 E Q, without further reference 
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to A4 or Xq, i.e. all our statistical information is contained in the likelihood function 

(BirnbauSLXl96J)). 



C£ was first proposed by iPratt (19621 ) by means of an example : if an accurate voltmeter 
gave a reading of 87, does it matter, for the interpretation of this reading (assumed error- 
free), whether the meter's range was bounded by 1,000 or by 100 ? 



2.3.2. The Stopping Rule principle ST 

ST : The Stopping Rule principle states that the sampling design is irrelevant to statistical 
inference at the stage of data analysis. 

This principle is formally equivalent to the Censoring principle, if one considers the following 
stopping rule : stop the experiment as soon as "a;o" is observed. ST can be accepted if 
one is working with an experiment which will be performed once only. It is certainly 
not a satisfying principle for long-run sampling experiment, which is the basis of classical 
inference. 



2.3.3. The (Strong) Likelihood principle C 

C : Suppose a statistical experiment is characterized by two different models with com- 
mon parameter space : Mi{X,Q,pe{x), ^(x)) and Ai2{X ,Q,pg{x {x )) such that 
Peixo) — c-pg(Xg) for each 9 in 0, for some Xo in X, in X and for constant c ^ 0. Then 
C states that the same inference should be made about 9 whatever Xo or x^ is observed. 

The Likelihood principle says that all the relevant information for inference about 9 is 
contained in the sole knowledge of the relative likelihood function. This principle is advo- 



cated and criticized by many statisticians. See for instance 



Fisher (1950^ 



Birnbaum (1962 . 



19641) 



M 



ricmon 



Barnard a967 



mm 



Basu (1973I ). 



Berger (19851 ) or 



Berger and Wolpert (19881) 



Birnbaum (19641 ) proved that £ is equivalent to the Sufhciency principle S (see next section) 



kC£ or to S k ST. 



2.4. Third set : Reduction-type principles 

2.4.1. The Reduction principle TZ 

TZ : In logic, if A C and B ^ C, then {AU B) ^ C. In statistics, one has a similar 
principle. Let I{A) be the inferential information contained in the observation A [or some 
statistical inference made from A]. If the data A and B lead to the same inference I{A) — 
I{B), one should perform equivalent inference from the observation oi A U B : i.e. if 
A ^ I{A) and B => I{B)=I{A), then {AUB)=> I{A U B)^I{A)=I{B). 



This Reduction principle was proposed by 



Dawid (19771 ). It gives a general framework for 



all the (partial) Sufhciency or Conditionality principles. 



2.4.2. The Sufhciency principle S 

S : In an experiment M.{X,Q^pg{x), ^{x)), we get the same information about 9^ if we 
observe the realization Xq or only its realization through a sufficient statistic T{xo) = to- 

This principle is certainly the most widely accepted principle in statistics. Together with 
C£ or ST, it implies that the likelihood function is only relevant for inference up to a 
proportional constant. 
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2.4.3. The Conditionality principle CO 

CO : Suppose we have an experiment M.{X^ Q,pg{x), p,{x)) and a maximal ancillary statis- 
tic T{x). T is ancillary if pg{T{x)) is independent of 0. Then our inference about 6 should 
be done through the conditional probability pg(x\T{x)). 



CO was studied, among others, by 



Barndorff-Nielsen QQTl 



mm- 



Birnbaum (196'Zi\ 



Co2lX1958|) and 

proved that the Sufficiency principle S together with the Conditionality principle CO implies 
the Likelihood principle C. 



2.4.4. The Partial Nonformation principles VN 

VS [Partial Sufficiency principles] : Let T{x) be a partial su fficient statistic, in som e 



specified sens e, like B-, S-, M-, K-, I- or L -sufficiency. See 



Remon ( 19841) 



BarndorflF-Nielsen fl97l[ ) 



Jorgensen fl993r ) for definitions of partial 



Cano Sanchez et al. Q989r) or 

sufficiency. All the Partial Sufficiency principles state that one gets the same inferential 
information about some parameter of interest from the knowledge of "Xo" or "T(a;o)" , and 
that one has to do inference through the marginal distribution Pg{T{x)). 

Equivalent Partial Conditionality principles VC require that our inference should be done 
thro ugh the conditional distrib ution given the observation of some B-, S-, ... ancillary statis- 



tic. IBarndorff-Nielsen (19781 ) introduced the concept of nonformation which generalizes 
both notions of partial sufficiency and partial ancillarity, and leads to Partial Nonformation 
principles PAf. 



2.5. Summary 

All these statistical principles can be summarized in three principles 
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• The Invariance principle X about the choice of parameterization; 

• The Likelihood principle C about the choice for the reference sample space, which 
is equivalent to the Censoring C£ or Stopping Rule principle ST together with the 
Sufficiency principle 5; 

• The Reduction principle TZ which generalizes the Partial Nonformation principles VM 
about the kind of information one has to consider in the data ("xo" or "T(xo)" ?). 

The next section will discuss the logic of ignorance versus structural information with respect 
to the best choice for a support measure over the hypotheses space Q. 



3. Is there a good choice for a support measure with respect to the hypotheses ? 



3.1. Introduction 

In this paper, we choose the general terms of support measure to express the support the 
observation data give to some unknown hypothesis. When looking for support measures 
in the theories of ignorance or unce rtaint y, one finds a lot of propositio ns. Let us j ust 



mention here t he ancient Laplace's (|l812f ) inverse p robability the ory (see 



Fienberg f 20061 ) \ Dempster-Shafer's belief function ( Shafer. 1976f) . the classical Bayesian 



a posteriori probability, the structura l inference (jFraser. 1968f ). the theory of possibility 



( Zadeh. 1978 : 



Dale fl999h or 



Dubois and Prade. 20071) , t he plausibility measures (|Friedman and Halpern. 19951 ) 



or the recent general uncertainty theory (jZadeh. 20051) 



3.2. The case of the non informative Bayesian priors 

It is well known that additive priors, as proposed by the Ba yesian theory , are not suitable 
for expressing absence of knowledge about hypotheses. See IShafer (197& ] : if we have no 
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information about three hypotheses Hi,H2 and H^, we cannot say that we have a better 

knowledge about Hi U H2 with respect to H3 because we c an add these sm aU pieces of 
(non)inforniation. Another example is the one proposed by 



Bernardo fl979D : we toss a 



coin and we wish to do inference about its bias through the parameter of interest (p = \9—^\, 
where is the probability of observing "Head". We know that the coin is either fair 
{Hi : e = i), double-headed {H2 : ^ 1) or double-tailed {H3 : 9 = 0). We observe 
xq — "Head" U "Tail", i.e. we have no information coming from the data. The likelihood 
function is then l{9\xo) — I \f0 E O. If we express our ignorance about through an 
additive uniform measure : p{Hi) — — 1,2,3, we therefore state that the hypothesis 
Hi : = is twice less likely than the hypothesis Ha : (p^Q. This contradicts the situation 
of ignorance. 



Dawid et al. (1973i ) and 



Stone (19761) have proposed many paradoxes agai nst the additive 



Jeffreys (19m 



nature of the Bayesian prior, especially in the context of lack of information 
worked a lot to fin d non informativ e Bayesian priors. In his paper about the history of 
Bayesian Inference, 



Fienberg (20061 ) writes that trying to 'derive "objective" priors that 
expressed ignorance or lack of knowledge ' is like trying 'to grasp the holy grail that had 
eluded statisticians since the days of Laplace'. In fact, we can broaden the scope of the 
incompatibility between ignorance and additivity, to situations where partial information 
is available, i.e. to any kind of support measure. 



3.3. The incompatibility between "additivity" and the logic of ignorance 

Our knowledge (a priori or a posteriori) about the "true" unknown hypothesis 0o E O can 
be in some way informative. This does not mean that our support measure about this 
hypothesis behaves like a probability measure. Let us define the support measure describing 
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the likelihood the observation E gives t o the hypothesis 9 G Qi hy S\ 6 € Qi\E]. A support 



measure, like any plausibility measure ([Friedman and Halpern. 19951 ) . has to satisfy three 



axioms 



• s'[6ieei|£;] = oif£;^(6i^ei) 

• S'[6ieei|£;] = lif£;^(6ieei) 

. 92 c Oi ^ s[e e e2\E] < s[e e ei|£;] 

[monotonicity of the support function] 

The problem for choosing a support measure on Q is that this measure should always handle 
a part of ignorance. Indeed, even when it is an a posteriori support measure over 0, there 
will be hypotheses 9i with equivalent support from the observed data (through the likeli- 
hood function, for instance), and the support measure will have to manage this ignorance 
between these 9i. Once again, like in the Bernardo's coin example, this cannot be done 
by an additive support measure. Let us prove this incompatibility as a consequence of the 
Invariance I and Likelihood C principles. 

Suppose that we express our statistical information about 9 by means of an additive poste- 
rior support measure 5'[6'|i5]. Because of the Invariance I and Likelihood C, two 0- values, 9i 
and 92, having the same relative likelihood cannot be distinguished. To prove that, one has 
just to consider the function g{9) used in I as the permutation of 9i and 02- ^ implies that 
/ii = S'[6'i|_E] = S'[6'2|i?] s /i2- The logic of ignorance requires equivalent inference for 6'i U 6*2 
as for 9i. Considering 5'[6'|i?] as additive, one gets : S[9i U 6*21^] = f-ii + M2 = ^[^ili?] = fii. 
Hence, = /i2 = 0, which is far from convincing. All th is reasoning abou t the consequences 
of I and C was already mentioned, in similar terms, by 



Hartiean fl967l) . 
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As it is clear that additive support measures are incompatible with I and C, one can think 

that non-additive support measures, like the ones proposed in the possibility theory, will be 
the correct choice. The next section shows that support measures built on maximization 
(or minimization) are also to be questioned. 



3.4. The case of the possibility measure and its "Maxitivity" property 

The theory of possibility, as well as the th eory of plausibility, propo ses measures defined 



in terms of maximization or minimization. IDubois and Prade (200ffl have introduced the 



pretty terms of "Maxitivity" and "Maxitive measure" in reference to the additivity property 
of probability measures. For instance, the possibility measure for the state A C S* is denoted 
by n(A) and defined by : 



n(A) = sup7r(s) 

seA 

where tt : 5* — > [0, 1] is a possibility distribution for the states s G S 



A necessity measure can be defined for A <Z S hy N{A) — infs£^(l — 7r(s)) = 1 — n(A). 
One get the following "maxitivity" properties : 

n{AUB) = max(n(A),n(B)) 
N{AnB) = mm{N{A),N{B)) 



See 



Dubois and Prade (20071 ) and 



Sigarretta et al. (20071 ) for detailed explanations about 



possibility and plausibility measures. Let us note that n*(yl) = (n(A) + 7V(A^))/2 is 
still a possibility measure with the additional properties that n*(A) = is equivalent to 
the impossibility of A, and 11* (A) = 1 to the certainty of A. This can be interesting for 
comparison with a posteriori Bayesian probability, but it will not be used here. 
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3.5. The impossibility of a "maxitive" support measure satisfying I, C and TZ 

Let us define our support measure in the framework of tlie tlieory of possibility, but in 
relation to the relative likelihood function l{9; E) : 

S[6(^Qo\E] = snp l{e;E) 
sup pe{E) 
sup pe [E) 

It is clear that such a possibility measure satisfies the Invariance and Likelihood principles. 
However, the Reduction principle is not satisfied. Indeed, such a support measure based on 
the sole likelihood function is incompatible with the Reduction princi ple, as far as partial 



sufficiency principle is concerned. Let us consider the Bernardo's (|l979f ) coin example again. 



Remember that our parameter of interest is = 16*— ^ | where 6 is the probability of observing 
"Head", and that the coin is known to be either fair {Hi : 6 — ^), double-headed {H2 : 6 = 1) 
or double-tailed {H3 : 6 = 0). This time, we observe xq = "Head". The likelihood function 
is l{9\xo) = 6 \/6 E O. So, by definition of our support measure, one gets : 

S[6\ "Head"] = 1 - S[6\ "Tail"] = 6 

S[Hi\ "Head"] = S[Hi\ "Tail"] = \ [The support measure for fairness] 
S{H2 U F3I "Head"] = S[H2 U Hsl "Tail"] = 1 [The support measure for unfairness] 

We see that the likelihood, as well as our "maxitive" support measure, puts its highest 
support towards the unfairness of the coin, whatever the first toss gives as a result. We 
see also that there is an invariant structure in the model, concernin g our parameter of 



interest. Indeed, the minimal G-sufficient statistic with respect to 0, see 



Barnard Q963I ). is 



T{ "Head") = T( "Tail") = T{ "Head" or "Tail"). Thus, from Tl (or VS) and the fact that 
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the Marginal likelihood l{e\ "Head" or "Tail") = 1 V6' e 6, we should have : 

S[Hi\ "Head"] = S[Hi\ "Tail"] = S[Hi\ "Head" or "Tail"] = 1 
S[H2 U Hsl "Head"] = S[H2 U i/sl "Tail"] = S[H2 U i/sl "Head" or "Tail"] = 1 

This is clearly a better situation in terms of inferential support, as the first toss of a coin 
gives no information at all about the fairness or unfairness of a coin. 

This example shows the impossibility for a "maxitive" support measure to satisfy the Re- 
duction principle, as this last one, through structural invariance and partial sufficiency, 
introduces Marginal likelihood function in the scene of inference. And therefore, an addi- 
tive operation in terms of likelihood. Moreover, as we observed in the coin example, single 
and marginal likelihood functions can express totally different support with respect to the 
hypotheses. Which one should we prefer ? Our choice will, de facto, contradict either the 
Reduction or the Likelihood principle. 

3.6. Summary 

In this section, we have seen that neither the additivity nor the maxitivity approach is 
"the" solution for our support measure. The first one cannot handle properly the ignorance 
present in any statistical problem, while the second one cannot cope with its structural 
invariance (for instance, a location-scale structure emerging with the asymptotic normal 
model when the number of observations increases) . 

• The additive Bayesian posterior approach satisfies the Likelihood C and Reduction TZ 
principles, but not the Invariance I principle. TZ will be valid under the condition that 
a reference parameterization is chosen as well as a proper prior distribution over the 
parameter space 9. This extra information is required if paradoxes are to be avoided 



14 
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(|Dawid et al. (1973i ): 



Stone dQTCT l 



• The maxitive Maximized or Profile Likelihood approach (jBarndorff-Nielsen and Cox (19941) ) 
satisfies the Invariance I and Likelihood C principles, but not the Reduction TZ princi- 
ple. The Generalized Likelihood Ratio tests are based on this type of support measure, 
as well as the Maximum Likelihood point estimation. The extra information needed 
here to avo id paradoxes i s the long-run b e havior of the model, as well as its structural 



invariance (jStein (195611 



Barnard (196dt\ 



Bereer and Woloert f 19881 )) 



• The mixed additive- maxitive Marginal or Conditional Inference approaches have not 
yet been considered in this paper. The Marginal (or Conditional) Likelihood ap- 
proach is defined in the same way as the Profile Likelihood approach, but using the 
marginal [respectively conditional] likelihood function l{0\T{x)) [/(6'; a;|r(a;))] instead 
of the simple likelihood function l{9;x), as we did in Bernardo's example. The In- 
variance I and Reduction TZ principles will be satisfied here, but not the Likelihood 
C principle. The problem here is the d efinition of what is a partial nonformativ e 



statistic Tjx) 



Barndorff-Nielsen (19781): 



Zhu and Reid (19941) 



Remon (19841) 



Barndorff-Nielsen and Cox (19941) ). The use of a marginal or 



Cano Sanchez et al. (19891) 



conditional likelihood function requires extra information as the knowledge of the stop- 
ping rule because the marginal or conditional density can differ from one stopping rule 
to another. C is no longer valid for this type of inference. 



Bayesian, Profile Likelihood and Marginal/Conditional Likelihood inferences are three ma- 
jor approaches corresponding to the possible two-by-two combinations of our general prin- 
ciples. Other inference methods can be classified in the same way, depending on the list of 
principles they satisfy. But none will be able to satisfy all these principles, as there is an 
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internal incompatibility between them. This incompatibility can be seen as a dilemma be- 
tween an additive or a maxitive approach for dealing with the ignorance and the structural 
information contained in the data. 



4. Conclusions : the dilemma between "Additivity" and "Maxitivity" 

Our point of view is that discussion about Statistical Schools of Inference should not focus 
so much on the kind of principle one keeps or rejects, or even by-passes thanks to some well 
chosen extra information. Indeed, any inference theory seems to miss some information, 
as extra information is always needed to avoid paradoxes. Statisticians should be more 
aware of and worried by the mathematical properties of the support measure they wish to 
use. Here comes the dilemma between the "additivity" and the "maxitivity" of our support 
measure. 

One can think that most statisticians will prefer an additive approach, by similarity with 
the probability theory, but this is not so clear. Indeed, the core of the point estimation 
is done in a maxitive environment. And if they have to compare hypotheses, they will 
normally use likelihood ratios, which are based on maxitive support measures. We think 
that neither the maxitive nor the additive approach should be promoted as the sole possible 
approach. 

Our point of view is that statisticians should use both perspectives, in a dialogal process, lik e 
in the Marginal/Conditional Likelihood approach. EM algorithm (jPempster et al. (19771 )') 
is also a good example of this combined use of "maxitive" and "additive" operators. This 
double nature of the support measure is, for us, the characteristic of statistical inference, 
as statisticians should consider themselves as staying in the middle of the road, trying 



1 6 M. Remon 

to reconcile the logic of ignorance (related to "Maxitivity") and the logic of information 
(linked to "Additivity"). This is also the source of the efficiency of many statistical ad hoc 
methods. 
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